

**Amendments to the Specification:**

Please replace the paragraph beginning on line 14 of page 13 of the specification with the following amended paragraph:

In many applications it is desired to ~~extent~~ extend the processing time in the pipeline stage. For example, in a high-speed data acquisition and processing system such as the ones at the Large Hadron Collider (LHC) experiments at CERN, where 16- to 32-bit data per channel are received every 25 ns, a pipeline stage would not only need the time required to fetch the 32-bit input data[[.]] and to exchange the information with its neighbors (see Figure 3), but would also need the time required to reduce the data received from neighbors (2x2, or 4x4) in order to be able to send through the exit port every 25 ns a reasonable amount of reduced data through a reasonable number of lines.

Please replace the paragraph beginning on line 25 of page 14 of the specification with the following amended paragraph:

The 3D-Flow architecture is designed for applications where it is required to ~~extent extend~~ the processing time processing in one pipelined stage beyond the time interval between ~~to~~ two consecutive input set of data. The architecture is based on a single type of replicated circuit cascaded through “bypass switches + a register.”

Please replace the paragraph beginning on line 28 of page 14 of the specification with the following amended paragraph:

The circuit can be a ~~commercial~~ commercially available component, which ~~require in this case to implement~~ requires external implementation of ~~externally~~ the “bypass switches + register,” or, when system performance of high throughput ~~are~~ is required, the circuit can be a 3D-Flow processor (see Section 5.1.3.2), which has an internal architecture with powerful I/O and instructions performing efficient data movement and has the “bypass switches + register” implemented internally, ~~is more suitable to solve the latter tasks.~~

Please replace the paragraph beginning on line 32 of page 14 of the specification with the following amended paragraph:

~~Following there is the~~ What follows is a description of the 3D-flow architecture based on the 3D-Flow processor, the use of the described “bypass switches + register” interfaced to a commercially available processor will implement the same 3D-flow architecture, ~~however~~. However, it will ~~be~~ have less performance since more ~~instruction~~ instructions will be needed (~~compared relative to the~~ one ~~architecture~~ based on the 3D-Flow processor), in order to move data across the system.

Please replace the paragraph beginning on line 19 of page 18 of the specification with the following amended paragraph:

Because the 3D-Flow approach is based on a single type of circuit, it is natural to keep this modularity with a single type of replicated component that does not require glue logic for its interconnection. For this reason, as well as the fact that ~~IC~~ integrated circuit design advances are rapid, it is best to retain it in IP (Intellectual Property) form written in generic VHDL reusable code so that it can be implemented at any time using any technology. In this way, it can be implemented at the last moment using the latest technology that will provide the best characteristics (low power dissipation, lower cost, smaller size, higher speed). See Section 5.4.1 for more information in regard to the 3D-Flow ASIC.

Please replace the paragraph beginning on line 28 of page 3 of the specification, with the following amended paragraph:

For a stage requiring the execution of an algorithm which is three times longer than the time interval between two consecutive input data, three identical circuits should be cascaded, and so on. Data and results flow synchronously from the first circuit as input of the system, through the “bypass switches + register” of the cascaded circuits, to ~~the last at the~~ output. Multi-channel systems have several linear arrays of cascaded circuits (or processors) side-by-side that can also be interconnected laterally.

Please replace the paragraph beginning on line 37 of page 67 of the specification with the following amended paragraph:

- LAL board design (4): front-end card (248 units) - ~~(Ref. [6] See. 4.1; Ref. [7] See. 2)~~; ECAL summary card (28 units) - ~~(Ref. [6], See. 4.2.2, Ref. [7], See. 3.2.1)~~; HCAL summary card (8 units) - ~~([6], See. 4.2.3; [7], See. 3.2.2)~~; selection card (18 units), selection controller card (2 units) - ~~([6], See. 4.3; [7], See. 3.3)~~;
- 3D-Flow board design (3D-Flow mixed-signal board (96 units) - ~~(Ref 1)~~}, Sec. 6.1 6,4, 6.5, and 6.6; [4], Sec. 5, and 6); Ref. [3], Sec. 6.2 for the digital board;

Please replace the paragraph beginning on line 7 of page 68 of the specification with the following amended paragraph:

- Bologna board design (4); front-end card (212 units) - ~~([8], See. 3.2; [9], See. 3, and Table 7)~~; ECAL L0 card (208 units) - ~~([8], See. 3.2; [9], See. 3, and Table 7)~~; HCAL L0 card (56 units) - ~~([8], See. 3.2; [9], See. 4, and Table 7)~~; Message dispatcher card (1 unit) - ~~([9], See. 3.6, and Tb 7)~~; and
- CMA board design (6): Receiver cards (152 units); EI cards (152 units); JS cards (19 units); CEM cards (19 units); LTTC cards (19 un.); ROC cards (19 un.). Ref. [11] Sec. 2.

Please replace Table 9 on page 68, line 16, with the following amended table:

Table 9. Fast data acquisition and processing implementations: Features and Performances.

| Item                   | CMS | LAL | 3DF | BO |
|------------------------|-----|-----|-----|----|
| 2 x 2 Algorithm        |     | X   | X   |    |
| 3 x 3 Algorithm        | X   |     | X   | X  |
| Fully programmable     |     |     | X   |    |
| Add subsystems later   |     |     | X   |    |
| No boundary limitation |     |     | X   |    |
| Modular Scalable       |     |     | X   |    |
| Technology Independent |     |     | X   |    |

Please insert the following text on page 68 after Table 9:

### **References**

<sup>1</sup> Eisenhandler, "Hardware Triggers at the LHC. Fourth Workshop on Electronics for LHC Experiments, Rome- Italy, September 21-25, 1998, pp. 47-56.

<sup>2</sup> <http://atlasinfo.cern.ch/atlas/groups/daqtrig/tdr/tdr.html> (The Atlas Level-1 Triger Technical Design Report)

<sup>3</sup> [http://hep.physics.wisc.edu/wsmith/cms/Lehman\\_Cal.pdf](http://hep.physics.wisc.edu/wsmith/cms/Lehman_Cal.pdf)

Please replace the abstract on page 71, with the following abstract:

### **Abstract**

A system extends the execution time of a pipeline stage to a time longer than the time interval between two consecutive input data. Each processor in the system has an input and output port connected to a "bypass switch" (or multiplexer). Input data is sent either to a processor, for processing, or to a processor output port, in which case no processing is performed, through a register using at least one clock cycle to move data from register input to register output. For a single channel requiring an execution time twice the time interval between two consecutive input data, two processors are interconnected by the bypass switch. Data flows from the first processor at the input of the system, through the "bypass switches" of the interconnected processors, to output. The "bypass switches" are configured with respect to the processors such that the system data rate is independent of processor number.

Please delete pages 72-75 of the specification.